PERSISTENCE now only includes those cases within 1000 characters before the actual adjective, so adjust the PERSIST_FORM variable accordingly to only count those cases:
Look at COMPARISON: 26.2% analytic, 73.8% synthetic
explorer.cat(x$COMPARISON)
$`Missing data`
[1] FALSE
$Frequencies
analytic synthetic
117 329
$Percentages
analytic synthetic
0.262 0.738
$`Freqs of freqs`
117 329
1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
2 446
2.2 Every Other Variable
Look at ADJ_LEN: 1st Qu = Median, but that doesn’t change in any of the suggested transformations, so leave it as is
explorer.num(x$ADJ_LEN)
$`Special data points`
Missing data Zeros Negatives
FALSE FALSE FALSE
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.000 4.000 4.000 5.193 6.000 13.000
$`Types and tokens`
Types Tokens
11 446
$`Freqs of freqs`
0 1 9 14 24 28 39 70 82 164
1 2 1 2 1 1 1 1 1 1
$`Smallest 'meaningful' difference`
[1] 1
Look at ADVMOD: 92.2% no vs. 7.8% yes, very few adjectives are adverbially modified
explorer.cat(x$ADVMOD)
$`Missing data`
[1] FALSE
$Frequencies
n y
411 35
$Percentages
n y
0.922 0.078
$`Freqs of freqs`
35 411
1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
2 446
Look at COMPL: 2.3% to-infinitive, 84.7% no complement, 8% prepositional phrase, 5% than-phrase; few complements overall; conflate to to-infinitive and prepositional phrase vs. no complement and than-phrase (see Mondorf (2014) for why than-phrases are not complements).
explorer.cat(x$COMPL)
$`Missing data`
[1] FALSE
$Frequencies
i n p t
11 398 23 14
$Percentages
i n p t
0.025 0.892 0.052 0.031
$`Freqs of freqs`
11 14 23 398
1 1 1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
4 446
$`Missing data`
[1] FALSE
$Frequencies
y n
34 412
$Percentages
y n
0.076 0.924
$`Freqs of freqs`
34 412
1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
2 446
Look at DPNOFREQ: looks okay
explorer.num(x$DPNOFREQ)
$`Special data points`
Missing data Zeros Negatives
FALSE FALSE FALSE
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000938 0.0212153 0.0323701 0.0787586 0.1563902 0.2237505
$`Types and tokens`
Types Tokens
149 446
$`Freqs of freqs`
0 1 2 3 4 5 7 9 10 11 12 18 26 38 41
1 84 32 7 4 6 4 3 3 1 1 1 1 1 1
$`Smallest 'meaningful' difference`
[1] 5.49e-06
Look at FINAL_SEGMENT: fine
explorer.num(x$FINAL_SEGMENT)
$`Special data points`
Missing data Zeros Negatives
"FALSE" "TRUE:100" "FALSE"
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2500 0.5000 0.4105 0.6250 1.0000
$`Types and tokens`
Types Tokens
14 446
$`Freqs of freqs`
0 2 11 12 13 17 19 20 43 80 100 102
1 2 1 2 2 1 1 1 1 1 1 1
$`Smallest 'meaningful' difference`
[1] 0.04166667
Look at FORM: 49.1% comparative vs. 50.9% superlative
explorer.cat(x$FORM)
$`Missing data`
[1] FALSE
$Frequencies
comparative superlative
219 227
$Percentages
comparative superlative
0.491 0.509
$`Freqs of freqs`
219 227
1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
2 446
Look at LEXDIV: one pretty low value, winsorize to lowest value in boxplot that is not an outlier (-43.82754)
Look at NEWSPAPER: 24.2% Daily Mail vs. 27.8% Daily Mirror vs. 23.5% Daily News vs. 24.4% Independent
explorer.cat(x$NEWSPAPER)
$`Missing data`
[1] FALSE
$Frequencies
Daily Mail Daily Mirror Daily News Independent
108 124 105 109
$Percentages
Daily Mail Daily Mirror Daily News Independent
0.242 0.278 0.235 0.244
$`Freqs of freqs`
105 108 109 124
1 1 1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
4 446
Look at PERSIST_FORM.1000: 25.6% comparative vs. 49.6% none vs. 24.9% superlative
explorer.cat(x$PERSIST_FORM.1000)
$`Missing data`
[1] FALSE
$Frequencies
comparative none superlative
114 221 111
$Percentages
comparative none superlative
0.256 0.496 0.249
$`Freqs of freqs`
111 114 221
1 1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
3 446
Look at PERSISTENCE: about half of the values are 0, I could change that by raising the max. limit of PERSIST_DIST (right now it’s 1000 characters), but there are still 26.8% of data points without Persistence so this wouldn’t be fixed entirely; more negative than positive values because there are more synthetic comparisons
explorer.num(x$PERSISTENCE)
$`Special data points`
Missing data Zeros Negatives
"FALSE" "TRUE:221" "TRUE:163"
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.0000 -0.6020 0.0000 -0.1638 0.0000 1.0000
$`Types and tokens`
Types Tokens
189 446
$`Freqs of freqs`
0 1 2 3 8 221
1 161 22 4 1 1
$`Smallest 'meaningful' difference`
[1] 0.001
Look at RHY_DIFF: a lot of zeros but none of the suggested transformations would fix that
explorer.num(x$RHY_DIFF)
$`Special data points`
Missing data Zeros Negatives
"FALSE" "TRUE:222" "TRUE:182"
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1.0000 -0.3333 0.0000 -0.1026 0.0000 0.6667
$`Types and tokens`
Types Tokens
51 446
$`Freqs of freqs`
0 1 2 3 4 5 6 7 11 13 17 37 66 222
1 23 13 3 1 1 1 1 1 1 1 1 1 1
$`Smallest 'meaningful' difference`
[1] 0.00297619
Look at RHY_A: looks fine
explorer.num(x$RHY_A)
$`Special data points`
Missing data Zeros Negatives
"FALSE" "TRUE:207" "FALSE"
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.1250 0.1859 0.3333 1.0000
$`Types and tokens`
Types Tokens
26 446
$`Freqs of freqs`
0 1 2 3 4 5 6 20 24 53 90 207
1 7 7 2 2 1 2 1 1 1 1 1
$`Smallest 'meaningful' difference`
[1] 0.00297619
Look at RHY_S: again a lot of zeros
explorer.num(x$RHY_S)
$`Special data points`
Missing data Zeros Negatives
"FALSE" "TRUE:336" "FALSE"
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.0833 0.0000 1.0000
$`Types and tokens`
Types Tokens
24 446
$`Freqs of freqs`
0 1 2 3 5 6 14 25 33 336
1 10 7 1 1 1 1 1 1 1
$`Smallest 'meaningful' difference`
[1] 0.005952381
Look at SEG_DIFF: fine
explorer.num(x$SEG_DIFF)
$`Special data points`
Missing data Zeros Negatives
"FALSE" "TRUE:37" "TRUE:277"
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.17273 -0.05706 -0.03507 -0.01644 0.01389 0.32143
$`Types and tokens`
Types Tokens
264 446
$`Freqs of freqs`
0 1 2 3 4 5 6 7 37
1 175 41 22 7 4 4 2 1
$`Smallest 'meaningful' difference`
[1] 4.768e-06
Look at SEG_A: fine
explorer.num(x$SEG_A)
$`Special data points`
Missing data Zeros Negatives
"FALSE" "TRUE:10" "FALSE"
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.2000 0.2547 0.2520 0.3206 0.5000
$`Types and tokens`
Types Tokens
172 446
$`Freqs of freqs`
0 1 2 3 4 5 6 7 8 9 10 11 13 16 24
1 95 27 17 10 5 1 4 4 2 2 1 2 1 1
$`Smallest 'meaningful' difference`
[1] 2.3527e-05
Look at SEG_S: fine
explorer.num(x$SEG_S)
$`Special data points`
Missing data Zeros Negatives
"FALSE" "TRUE:19" "FALSE"
$Summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.1667 0.2301 0.2355 0.3079 0.6250
$`Types and tokens`
Types Tokens
175 446
$`Freqs of freqs`
0 1 2 3 4 5 6 7 8 10 11 12 16 19
1 103 30 12 6 6 3 3 1 2 4 1 2 2
$`Smallest 'meaningful' difference`
[1] 1.8038e-05
Look at STRESS_LAST_SYLL: 27.8% no vs. 72.2% yes
explorer.cat(x$STRESS_LAST_SYLL)
$`Missing data`
[1] FALSE
$Frequencies
n y
124 322
$Percentages
n y
0.278 0.722
$`Freqs of freqs`
124 322
1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
2 446
Look at SYNT_FUN: 78% attributive vs. 5.8% nominal vs. 15.7% predicative vs. 0.4% post-nominal; conflate to predicative and post-nominal vs. attributive and nominal
explorer.cat(x$SYNT_FUN)
$`Missing data`
[1] FALSE
$Frequencies
a n p pn
348 26 70 2
$Percentages
a n p pn
0.780 0.058 0.157 0.004
$`Freqs of freqs`
2 26 70 348
1 1 1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
4 446
Look at SYNT_FUN.confl: 83.9% attributive vs. 16.1% predicative
explorer.cat(x$SYNT_FUN.confl)
$`Missing data`
[1] FALSE
$Frequencies
a p
374 72
$Percentages
a p
0.839 0.161
$`Freqs of freqs`
72 374
1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
2 446
Look at VARIETY: 48.7% BrE vs. 51.3% LK
explorer.cat(x$VARIETY)
$`Missing data`
[1] FALSE
$Frequencies
BrE LK
217 229
$Percentages
BrE LK
0.487 0.513
$`Freqs of freqs`
217 229
1 1
$`Types and tokens`
types (no NAs) Tokens (no NAs)
2 446
The numeric variables - I went with ctree() because for RHY_DIFF the tree() function made too many splits to compute them & e.g. the 15 splits for ZIPF_FREQ are just not interpretable
x$PREDS.NUM.rf1 <-predict( rf.1, type="prob")[,"synthetic"]x$PREDS.CAT.rf1 <-predict(rf.1)x$PREDS.NUM.rf1.obs <-ifelse( # the probability with wich the observed variant was chosen x$COMPARISON=="synthetic", x$PREDS.NUM.rf1,1-x$PREDS.NUM.rf1)
logloss <-mean(-log(x$PREDS.NUM.rf1.obs)) # 0.084
(c.m <-table(OBS=x$COMPARISON, PREDS=x$PREDS.CAT.rf1)); c( # confusion matrix & its eval"Class. acc."=mean(x$COMPARISON==x$PREDS.CAT.rf1, na.rm=TRUE),"Prec. for synthetic"=c.m["synthetic","synthetic"] /sum(c.m[ ,"synthetic"]),"Rec. for synthetic"=c.m["synthetic","synthetic"] /sum(c.m["synthetic",]),"Prec. for analytic"=c.m["analytic","analytic"] /sum(c.m[ ,"analytic"]),"Rec. for analytic"=c.m["analytic","analytic"] /sum(c.m["analytic",]))
(pd.cas <-partial( # make pd.c contain partial dependence scoresobject=rf.1, # from this forestpred.var="ADJ_LEN", # for this predictorwhich.class=2, # for the 2nd level of the responsetrain=x,prob=TRUE))
b <-matrix(data=pd.corpustrigger$yhat, nrow=2)rownames(b) <-levels(x$STRESS_LAST_SYLL)#png("03d_pd-stresslastsyll.png", width=22.5, height=15, units="cm", res=300)plot(x=0, ylim=c(0.5,1), xlim=c(0,1), xaxt='n', bty='n', pch='', xlab='Stress on the last syllable of the adjective lemma',ylab=substitute(paste('Probability of ', italic('synthetic'), 'comparison')),main='Partial dep. of COMPARISON on STRESS_LAST_SYLL',cex.main=1.5);grid()axis(1, at=0:1, labels=c('no', 'yes'))#abline(h=partial(object=rf.1, pred.var='VARIETY', which.class=2, train=x, prob=TRUE)[1,2], lty=2)abline(h=sum(x$COMPARISON=='synthetic')/nrow(x), lty=2)points(x=0:1, y=b[,1], pch=16, col=alpha('#004F86', alpha=0.7), cex=3)
Mondorf, B. (2014). Apparently competing motivations in morpho-syntactic variation. in E. A. Moravcsik, A. Malchukov, & B. MacWhinney (eds.), Competing motivations in grammar and usage (pp. 209–228). Oxford University Press.
Source Code
---title: "Adjectives in SLE"author: "Nina Funke"date: todaydate-format: "DD MMM YYYY"format: html: page-layout: full code-fold: false code-link: true code-copy: true code-tools: true code-line-numbers: true code-overflow: scroll number-sections: true smooth-scroll: true toc: true toc-title: Table of Contents toc-depth: 4 number-depth: 4 toc-expand: true toc-location: left monofont: lucida console tbl-cap-location: top fig-cap-location: bottom fig-width: 8 fig-height: 8 fig-format: png fig-dpi: 300 fig-align: center embed-resources: true link-external-newwindow: true bibliography: C:/Users/si2687local/Desktop/references.bib csl: C:/Users/si2687local/Desktop/APA.cslexecute: cache: false echo: true eval: true warning: falsecrossref: thm-title: Formula thm-prefix: Formula---# Data PreparationLoad packages```{r}rm(list=ls(all=TRUE))source("_helpers/explorer.cat.r") # <1>source("_helpers/explorer.num.r") # <1>library(Boruta); library(caret); library(dplyr); library(MASS); library(partykit); library(pdp); library(randomForest); library(tree)```1. Both functions are created by [Stefan Th. Gries](https://stgries.info).Load the data```{r}x <-read.delim(file="01a_adj.txt",stringsAsFactors=TRUE,dec=',' )```Add persistence measure: positive values show that the previous comparison was *analytic* and negative values show that it was *synthetic*```{r}x$PERSIST_DIST[x$PERSIST_DIST=="none"] <-NAx$PERSIST_DIST <-as.numeric(as.character(x$PERSIST_DIST))``````{r}x$PERSISTENCE <--(1-(x$PERSIST_DIST/1000))x$PERSISTENCE[is.na(x$PERSISTENCE)] <-0x$PERSISTENCE[x$PERSIST_DIST>1000] <-0x$PERSISTENCE[x$PERSIST_COMP=='analytic'] <--x$PERSISTENCE[x$PERSIST_COMP=='analytic']````PERSISTENCE` now only includes those cases within 1000 characters before the actual adjective, so adjust the `PERSIST_FORM` variable accordingly to only count those cases:```{r}x$PERSIST_FORM.1000<- x$PERSIST_FORMx$PERSIST_FORM.1000[x$PERSIST_DIST >1000] <-'none'```Add the Rhythm differences: if positive, the analytic pattern was better, if negative, the synthetic pattern was better```{r}x$RHY_A <-ifelse(x$COMPARISON=='analytic', x$RHY_SCORE_POS, x$RHY_SCORE_ALT_POS)x$RHY_S <-ifelse(x$COMPARISON=='synthetic', x$RHY_SCORE_POS, x$RHY_SCORE_ALT_POS)x$RHY_DIFF <- x$RHY_S-x$RHY_A```Add the Segment differences: if positive, the analytic patterns was better, if negative, the synthetic pattern was better```{r}x$SEG_A <-ifelse(x$COMPARISON=='analytic', x$SEG_SCORE, x$SEG_SCORE_ALT)x$SEG_S <-ifelse(x$COMPARISON=='synthetic', x$SEG_SCORE, x$SEG_SCORE_ALT)x$SEG_DIFF <- x$SEG_S-x$SEG_A``````{r}summary(x)```# Data ExplorationThe variables initially included are: `ADJ_LEN`, `ADVMOD`, `COMPARISON`, `COMPL`, `DPNOFREQ`, `FINAL_SEGMENT`, `FORM`, `LEXDIV`, `NEWSPAPER`, `PERSIST_FORM.1000`, `PERSISTENCE`, `READABILITY`, `RHY_DIFF`, `RHY_A`, `RHY_S`, `SEG_DIFF`, `SEG_A`, `SEG_S`, `STRESS_LAST_SYLL`, `SYNT_FUN`, `VARIETY`, `WORD_COUNT`, `ZIPF_FREQ`## Dependent VariableLook at `COMPARISON`: 26.2% *analytic*, 73.8% *synthetic*```{r}explorer.cat(x$COMPARISON)```## Every Other VariableLook at `ADJ_LEN`: 1st Qu = Median, but that doesn't change in any of the suggested transformations, so leave it as is```{r}explorer.num(x$ADJ_LEN)```Look at `ADVMOD`: 92.2% *no* vs. 7.8% *yes*, very few adjectives are adverbially modified```{r}explorer.cat(x$ADVMOD)```Look at `COMPL`: 2.3% *to-infinitive*, 84.7% *no complement*, 8% *prepositional phrase*, 5% *than-phrase*; few complements overall; conflate to *to-infinitive* and *prepositional phrase* vs. *no complement* and *than-phrase* (see @Mondorf.2014 for why *than-phrases* are not complements).```{r}explorer.cat(x$COMPL)tree(x$COMPARISON ~ x$COMPL)x$COMPL.confl <- x$COMPLlevels(x$COMPL.confl) <-c('y', 'n', 'y', 'n')```Look at `COMPL.confl`: 7.6% *yes* vs. 92.4% *no*```{r}explorer.cat(x$COMPL.confl)```Look at `DPNOFREQ`: looks okay```{r}explorer.num(x$DPNOFREQ)```Look at `FINAL_SEGMENT`: fine```{r}explorer.num(x$FINAL_SEGMENT)```Look at `FORM`: 49.1% *comparative* vs. 50.9% *superlative*```{r}explorer.cat(x$FORM)```Look at `LEXDIV`: one pretty low value, winsorize to lowest value in boxplot that is not an outlier (-43.82754)```{r}explorer.num(x$LEXDIV)boxplot(x$LEXDIV)$stats[1,1] # -43.82754x$LEXDIV.win <- x$LEXDIVx$LEXDIV.win[x$LEXDIV.win <-43.82754] <--43.82754```Look at `LEXDIV.win`: better```{r}explorer.num(x$LEXDIV.win)```Look at `NEWSPAPER`: 24.2% *Daily Mail* vs. 27.8% *Daily Mirror* vs. 23.5% *Daily News* vs. 24.4% *Independent*```{r}explorer.cat(x$NEWSPAPER)```Look at `PERSIST_FORM.1000`: 25.6% *comparative* vs. 49.6% *none* vs. 24.9% *superlative*```{r}explorer.cat(x$PERSIST_FORM.1000)```Look at `PERSISTENCE`: about half of the values are 0, I could change that by raising the max. limit of PERSIST_DIST (right now it's 1000 characters), but there are still 26.8% of data points without Persistence so this wouldn't be fixed entirely; more negative than positive values because there are more *synthetic* comparisons```{r}explorer.num(x$PERSISTENCE)```Look at `READABILITY`: log it```{r}explorer.num(x$READABILITY)x$READABILITY.log <-log2(abs(x$READABILITY)) *sign(x$READABILITY)```Look at `READABILITY.log`: better```{r}explorer.num(x$READABILITY.log)```Look at `RHY_DIFF`: a lot of zeros but none of the suggested transformations would fix that```{r}explorer.num(x$RHY_DIFF)```Look at `RHY_A`: looks fine```{r}explorer.num(x$RHY_A)```Look at `RHY_S`: again a lot of zeros```{r}explorer.num(x$RHY_S)```Look at `SEG_DIFF`: fine```{r}explorer.num(x$SEG_DIFF)```Look at `SEG_A`: fine```{r}explorer.num(x$SEG_A)```Look at `SEG_S`: fine```{r}explorer.num(x$SEG_S)```Look at `STRESS_LAST_SYLL`: 27.8% *no* vs. 72.2% *yes*```{r}explorer.cat(x$STRESS_LAST_SYLL)```Look at `SYNT_FUN`: 78% *attributive* vs. 5.8% *nominal* vs. 15.7% *predicative* vs. 0.4% *post-nominal*; conflate to *predicative* and *post-nominal* vs. *attributive* and *nominal*```{r}explorer.cat(x$SYNT_FUN)tree(x$COMPARISON ~ x$SYNT_FUN)x$SYNT_FUN.confl <- x$SYNT_FUNlevels(x$SYNT_FUN.confl) <-c('a', 'a', 'p', 'p')```Look at `SYNT_FUN.confl`: 83.9% *attributive* vs. 16.1% *predicative*```{r}explorer.cat(x$SYNT_FUN.confl)```Look at `VARIETY`: 48.7% *BrE* vs. 51.3% *LK*```{r}explorer.cat(x$VARIETY)```Look at `WORD_COUNT`: log```{r}explorer.num(x$WORD_COUNT)x$WORD_COUNT.log <-log2(x$WORD_COUNT)```Look at `WORD_COUNT.log`: better```{r}explorer.num(x$WORD_COUNT.log)```Look at `ZIPF_FREQ`: skewed to the left... boxcox```{r}explorer.num(x$ZIPF_FREQ)b <-boxcox(lm(x$ZIPF_FREQ ~1))(lambda <- b$x[which.max(b$y)])x$ZIPF_FREQ.trans <- (x$ZIPF_FREQ^lambda -1)/lambda```Look at `ZIPF_FREQ.trans`:```{r}explorer.num(x$ZIPF_FREQ.trans)```# Manually add the interaction termsThe categorical variables```{r}x$VARIETYxADVMOD <- x$VARIETY:x$ADVMODx$VARIETYxCOMPL.confl <- x$VARIETY:x$COMPL.conflx$VARIETYxFORM <- x$VARIETY:x$FORMx$VARIETYxNEWSPAPER <- x$VARIETY:x$NEWSPAPERx$VARIETYxPERSIST_FORM.1000<- x$VARIETY:x$PERSIST_FORM.1000x$VARIETYxSTRESS_LAST_SYLL <- x$VARIETY:x$STRESS_LAST_SYLLx$VARIETYxSYNT_FUN.confl <- x$VARIETY:x$SYNT_FUN.confl```The numeric variables - I went with ctree() because for `RHY_DIFF` the tree() function made too many splits to compute them & e.g. the 15 splits for `ZIPF_FREQ` are just not interpretable1. `ADJ_LEN`:```{r}#(t1 <- tree(COMPARISON ~ ADJ_LEN, data=x)) # 4 levelsplot(t1 <-ctree(COMPARISON ~ ADJ_LEN, data=x)) # 4 levels```Split `ADJ_LEN` into 4 levels:```{r}x$ADJ_LEN.cat <-cut(x$ADJ_LEN, c(-Inf, 4, 5, 7, Inf), labels=c('(0,4]', '(4,5]', '(5,7]', '(7,13]'))```2. `DPNOFREQ````{r}#(t2 <- tree(COMPARISON ~ DPNOFREQ, data=x)) # 15 levelsplot(t2 <-ctree(COMPARISON ~ DPNOFREQ, data=x)) # 3 levels```Split `DPNOFREQ` into 3 levels:```{r}x$DPNOFREQ.cat <-cut(x$DPNOFREQ, c(-Inf, 0.017, 0.162, Inf), labels=c('[0,0.017]', '(0.017,0.162]', '(0.162,1]'))```3. `FINAL_SEGMENT`:```{r}#(t3 <- tree(COMPARISON ~ FINAL_SEGMENT, data=x)) # 5 levelsplot(t3 <-ctree(COMPARISON ~ FINAL_SEGMENT, data=x))```Split `FINAL_SEGMENT` into 5 levels:```{r}x$FINAL_SEGMENT.cat <-cut(x$FINAL_SEGMENT, c(-Inf, 0, 0.333, 0.625, 0.667, Inf), labels=c('0', '(0,0.333]', '(0.333,0.625]', '(0.625,0.667]', '(0.667,1]'))```3. `LEXDIV.win`:```{r}# (t3 <- tree(COMPARISON ~ LEXDIV.win, data=x)) # 7 levelsplot(t3 <-ctree(COMPARISON ~ LEXDIV.win, data=x))```No splits for `LEXDIV.win`4. `PERSISTENCE`:```{r}#(t4 <- tree(COMPARISON ~ PERSISTENCE, data=x)) # 2 levelsplot(t4 <-ctree(COMPARISON ~ PERSISTENCE, data=x)) # 2 levels```Split `PERSISTENCE` into 2 levels:```{r}x$PERSISTENCE.cat <-cut(x$PERSISTENCE, c(-Inf, 0.587, Inf), labels=c('[-1,0.587]', '(0.587,1]'))```5. `READABILITY.log`:```{r}#(t5 <- tree(COMPARISON ~ READABILITY, data=x)) # 3 levelsplot(t5 <-ctree(COMPARISON ~ READABILITY.log, data=x))```No split for `READABILITY.log`6. `RHY_DIFF`:```{r}#(t6 <- tree(COMPARISON ~ RHY_DIFF, data=x))plot(t6 <-ctree(COMPARISON ~ RHY_DIFF, data=x)) # 4 levels```Using ctree(), split `RHY_DIFF` into 4 levels:```{r}x$RHY_DIFF.cat <-cut(x$RHY_DIFF, c(-Inf, -0.33, -0.25, -0.054, Inf), labels=c('[-1,-0.33]', '(-0.33,-0.25]', '(-0.25,-0.054]', '(0.054,1]'))```7. `RHY_A`:```{r}#(t7 <- tree(COMPARISON ~ RHY_A, data=x)) # 7 levelsplot(t7 <-ctree(COMPARISON ~ RHY_A, data=x)) # 4 levels```Split `RHY_A` into 4 levels:```{r}x$RHY_A.cat <-cut(x$RHY_A, c(-Inf, 0.179, 0.225, 0.33, Inf), labels=c('[0,0.179]', '(0.179,0.225]', '(0.225,0.33]', '(0.33,1]'))```8. `RHY_S`:```{r}#(t8 <- tree(COMPARISON ~ RHY_S, data=x)) # 6 levelsplot(t8 <-ctree(COMPARISON ~ RHY_S, data=x))```Split `RHY_S` into 2 levels:```{r}x$RHY_S.cat <-cut(x$RHY_S, c(-Inf, 0.217, Inf), labels=c('[0,0.217]', '(0.217,1]'))```9. `SEG_DIFF````{r}(t9 <-tree(COMPARISON ~ SEG_DIFF, data=x)) # 5 levelsplot(t9 <-ctree(COMPARISON ~ SEG_DIFF, data=x))```No split for `SEG_DIFF`.10. `SEG_A````{r}(t10 <-tree(COMPARISON ~ SEG_A, data=x)) # 2 levelsplot(t10 <-ctree(COMPARISON ~ SEG_A, data=x))```No split for `SEG_A`.11. `SEG_S````{r}#(t11 <- tree(COMPARISON ~ SEG_S, data=x)) # 9 levelsplot(t11 <-ctree(COMPARISON ~ SEG_S, data=x))```Split `SEG_S` into 2 levels```{r}x$SEG_S.cat <-cut(x$SEG_S, c(-Inf, 0.327, Inf), labels=c('[0,0.327]', '(0.327,1]'))```12. `WORD_COUNT.log`:```{r}#(t12 <- tree(COMPARISON ~ WORD_COUNT, data=x)) # 8 levelsplot(t12 <-ctree(COMPARISON ~ WORD_COUNT.log, data=x))```No splits for `WORD_COUNT.log`13. `ZIPF_FREQ.trans`:```{r}#(t13 <- tree(COMPARISON ~ ZIPF_FREQ, data=x)) # 15 levelsplot(t13 <-ctree(COMPARISON ~ ZIPF_FREQ.trans, data=x))```Split `ZIPF_FREQ` into 4 levels:```{r}x$ZIPF_FREQ.cat <-cut(x$ZIPF_FREQ.trans, c(-Inf, 13.293, 14.756, 14.942, Inf), labels=c('[-0.5,13.293]', '(13.293,14.756]', '(14.756,14.942]', '(14.942,24]'))``````{r}x$VARIETYxADJ_LEN.cat <- x$VARIETY:x$ADJ_LEN.catx$VARIETYxDPNOFREQ.cat <- x$VARIETY:x$DPNOFREQ.catx$VARIETYxFINAL_SEGMENT.cat <- x$VARIETY:x$FINAL_SEGMENT.catx$VARIETYxPERSISTENCE.cat <- x$VARIETY:x$PERSISTENCE.catx$VARIETYxRHY_DIFF.cat <- x$VARIETY:x$RHY_DIFF.catx$VARIETYxRHY_A.cat <- x$VARIETY:x$RHY_A.catx$VARIETYxRHY_S.cat <- x$VARIETY:x$RHY_S.catx$VARIETYxSEG_S.cat <- x$VARIETY:x$SEG_S.catx$VARIETYxZIPF_FREQ.cat <- x$VARIETY:x$ZIPF_FREQ.cat```# Model PreparationVariabes to include in the model:`ADJ_LEN`, `DPNOFREQ`, `FINAL_SEGMENT`, `LEXDIV.win`, `PERSISTENCE`, `READABILITY.log`, `RHY_DIFF`, `RHY_A`, `RHY_S`, `SEG_DIFF`, `SEG_A`, `SEG_A`, `WORD_COUNT.log`, `ZIPF_FREQ.trans`, `ADVMOD`, `COMPL.confl`, `FORM`, `NEWSPAPER`, `PERSIST_FORM.1000`, `STRESS_LAST_SYLL`, `SYNT_FUN.confl`, `VARIETY`, `VARIETYxADJ_LEN.cat`, `VARIETYxDPNOFREQ.cat` , `VARIETYxFINAL_SEGMENT.cat`, `VARIETYxPERSISTENCE.cat`, `VARIETYxRHY_DIFF.cat`, `VARIETYxRHY_A.cat`, `VARIETYxRHY_S.cat`, `VARIETYxSEG_S.cat`, `VARIETYxZIPF_FREQ.cat`, `VARIETYxADVMOD`,`VARIETYxCOMPL.confl`, `VARIETYxFORM`,`VARIETYxNEWSPAPER`, `VARIETYxPERSIST_FORM.1000`, `VARIETYxSTRESS_LAST_SYLL`, `VARIETYxSYNT_FUN.confl`Compute the two baselines: 73.8% is the one to beat```{r}c("baseline 1"=baseline.1<-max(prop.table(table(x$COMPARISON))),"baseline 2"=baseline.2<-sum(prop.table(table(x$COMPARISON))^2))```Variable SelectionBoruta suggests: `ADJ_LEN`, `DPNOFREQ`, `FINAL_SEGMENT`, `LEXDIV.win`, `PERSISTENCE`, `READABILITY.log`, `RHY_DIFF`, `RHY_A`, `SEG_DIFF`, `SEG_A`, `SEG_A`, `WORD_COUNT.log`, `ZIPF_FREQ.trans`, `STRESS_LAST_SYLL`, `SYNT_FUN.confl`, `VARIETYxADJ_LEN.cat`, `VARIETYxDPNOFREQ.cat` , `VARIETYxFINAL_SEGMENT.cat`, `VARIETYxRHY_DIFF.cat`, `VARIETYxRHY_A.cat`, `VARIETYxRHY_S.cat`, `VARIETYxSEG_S.cat`, `VARIETYxZIPF_FREQ.cat`, `VARIETYxADVMOD`,`VARIETYxCOMPL.confl`, `VARIETYxFORM`, `VARIETYxPERSIST_FORM.1000`, `VARIETYxSTRESS_LAST_SYLL`, `VARIETYxSYNT_FUN.confl`Exclude: `ADVMOD`, `COMPL.confl`, `FORM`, `NEWSPAPER`, `PERSIST_FORM.1000`, `RHY_S`, `VARIETY`, `VARIETYxPERSISTENCE.cat`, `VARIETYxNEWSPAPER`, `VARIETYxFORM````{r}set.seed(sum(utf8ToInt("All the Young Dudes")))predictors <-Boruta(COMPARISON ~ ADJ_LEN + ADVMOD + COMPL.confl + DPNOFREQ + FINAL_SEGMENT + FORM + LEXDIV.win + NEWSPAPER + PERSIST_FORM.1000+ PERSISTENCE + READABILITY.log + RHY_DIFF + RHY_A + RHY_S + SEG_DIFF + SEG_A + SEG_S + STRESS_LAST_SYLL + SYNT_FUN.confl + VARIETY + WORD_COUNT.log + ZIPF_FREQ.trans + VARIETYxADJ_LEN.cat + VARIETYxDPNOFREQ.cat + VARIETYxFINAL_SEGMENT.cat + VARIETYxPERSISTENCE.cat + VARIETYxRHY_DIFF.cat + VARIETYxRHY_A.cat + VARIETYxRHY_S.cat + VARIETYxSEG_S.cat + VARIETYxZIPF_FREQ.cat + VARIETYxADVMOD + VARIETYxCOMPL.confl + VARIETYxFORM + VARIETYxNEWSPAPER + VARIETYxPERSIST_FORM.1000+ VARIETYxSTRESS_LAST_SYLL + VARIETYxSYNT_FUN.confl, data=x, maxRuns=200)attStats(predictors)``````{r}collector <-matrix(rep(NA, 60), ncol=10, dimnames=list( NTREE=ntree.vals <-c(500, 1000, 1500, 2000, 2500, 3000), MTRY=mtry.vals <-1:10)) for(k inseq(ntree.vals)){ for(j inseq(mtry.vals)){ seedy <-set.seed(sum(utf8ToInt("All the Young Dudes"))) collector[k,j] <-randomForest( COMPARISON ~ ADJ_LEN + DPNOFREQ + FINAL_SEGMENT + LEXDIV.win + PERSISTENCE + READABILITY.log + RHY_DIFF + RHY_A + SEG_DIFF + SEG_A + SEG_S + STRESS_LAST_SYLL + SYNT_FUN.confl + WORD_COUNT.log + ZIPF_FREQ.trans + VARIETYxADJ_LEN.cat + VARIETYxDPNOFREQ.cat + VARIETYxFINAL_SEGMENT.cat + VARIETYxRHY_DIFF.cat + VARIETYxRHY_A.cat + VARIETYxRHY_S.cat + VARIETYxSEG_S.cat + VARIETYxZIPF_FREQ.cat + VARIETYxADVMOD + VARIETYxCOMPL.confl + VARIETYxPERSIST_FORM.1000+ VARIETYxSTRESS_LAST_SYLL + VARIETYxSYNT_FUN.confl, data=x, ntree=ntree.vals[k], mtry=mtry.vals[j],importance=TRUE)$err.rate[ntree.vals[k], "OOB"] #cat(seedy, ntree.vals[k], mtry.vals[j], collector[k,j], "\n", sep="\t") }}mtry <-which(t(collector)==min(t(collector)), arr.ind=TRUE)[1,1] ntree <-which(t(collector)==min(t(collector)), arr.ind=TRUE)[1,2]*500```smallest collector for `mtry=5` and `ntree=500`# The Model```{r}set.seed(sum(utf8ToInt("All the Young Dudes")))(rf.1<-randomForest(COMPARISON ~ ADJ_LEN + DPNOFREQ + FINAL_SEGMENT + LEXDIV.win + PERSISTENCE + READABILITY.log + RHY_DIFF + RHY_A + SEG_DIFF + SEG_A + SEG_S + STRESS_LAST_SYLL + SYNT_FUN.confl + WORD_COUNT.log + ZIPF_FREQ.trans + VARIETYxADJ_LEN.cat + VARIETYxDPNOFREQ.cat + VARIETYxFINAL_SEGMENT.cat + VARIETYxRHY_DIFF.cat + VARIETYxRHY_A.cat + VARIETYxRHY_S.cat + VARIETYxSEG_S.cat + VARIETYxZIPF_FREQ.cat + VARIETYxADVMOD + VARIETYxCOMPL.confl + VARIETYxPERSIST_FORM.1000+ VARIETYxSTRESS_LAST_SYLL + VARIETYxSYNT_FUN.confl, data=x,ntree=500,mtry=5,importance=TRUE))``````{r}x$PREDS.NUM.rf1 <-predict( rf.1, type="prob")[,"synthetic"]x$PREDS.CAT.rf1 <-predict(rf.1)x$PREDS.NUM.rf1.obs <-ifelse( # the probability with wich the observed variant was chosen x$COMPARISON=="synthetic", x$PREDS.NUM.rf1,1-x$PREDS.NUM.rf1)``````{r}logloss <-mean(-log(x$PREDS.NUM.rf1.obs)) # 0.084``````{r}(c.m <-table(OBS=x$COMPARISON, PREDS=x$PREDS.CAT.rf1)); c( # confusion matrix & its eval"Class. acc."=mean(x$COMPARISON==x$PREDS.CAT.rf1, na.rm=TRUE),"Prec. for synthetic"=c.m["synthetic","synthetic"] /sum(c.m[ ,"synthetic"]),"Rec. for synthetic"=c.m["synthetic","synthetic"] /sum(c.m["synthetic",]),"Prec. for analytic"=c.m["analytic","analytic"] /sum(c.m[ ,"analytic"]),"Rec. for analytic"=c.m["analytic","analytic"] /sum(c.m["analytic",]))``````{r}(varimps <- rf.1$importance)[,3:4]``````{r}dotchart(sort(varimps[,1]), pch=4, xlab='Mean Decrease in Accuracy', main='Variable Importance Plot')``````{r}confusionMatrix(x$PREDS.CAT.rf1, x$COMPARISON)```# PlotsSave Variable importance scores```{r}#png("03a_VarIMPPlot.png", width=40, height=30, units="cm", res=300)dotchart(sort(varimps[,1]), pch=4, xlab='Mean Decrease in Accuracy', main='Variable Importance Plot')#dev.off()```## Plot the most important variablesPlot for `ADJ_LEN`:```{r}(pd.cas <-partial( # make pd.c contain partial dependence scoresobject=rf.1, # from this forestpred.var="ADJ_LEN", # for this predictorwhich.class=2, # for the 2nd level of the responsetrain=x,prob=TRUE)) tab.cas <-prop.table(table(x$ADJ_LEN))#png("03b_pd-adjlen.png", widt=18, height=13, units="cm", res=300)plot(main="Partial dep. of COMPARISON on ADJ_LEN",type="b", pch=16,xlab="Adjective Length in Characters",ylab=substitute(paste('Probability of ', italic('synthetic'), ' comparison')),ylim=c(0,1),x=pd.cas$ADJ_LEN,y=pd.cas$yhat,cex=1+tab.cas*10)#abline(h=partial(object=rf.1, pred.var = 'VARIETY', which.class=2, train=x, prob=TRUE)[1,2], lty=2)abline(h=sum(x$COMPARISON=='synthetic')/nrow(x), lty=2)lines(lowess(pd.cas$yhat ~ pd.cas$ADJ_LEN), lwd=6, col="#BCBCBCB0")#dev.off()```Plot for `VARIETYxADJ_LEN.cat`:```{r}(pd.corpustrigger <-partial(object=rf.1,pred.var='VARIETYxADJ_LEN.cat',which.class=2, train=x,prob=TRUE))b <-matrix(data=pd.corpustrigger$yhat, nrow=4)rownames(b) <-levels(x$ADJ_LEN.cat)colnames(b) <-levels(x$VARIETY)#png("03c_pd-varietyxadjlen.png", widt=18, height=13, units="cm", res=300)plot(x=0, ylim=c(0.5,1), xlim=c(0,3), xaxt='n', bty='n', pch='', xlab='Adjective length in characters',ylab=substitute(paste('Probability of ', italic('synthetic'), ' comparison')),main='Partial dep. of COMPARISON on VARIETYxADJ_LEN.cat',cex.main=1.5);grid()axis(1, at=0:3, labels=levels(x$ADJ_LEN.cat))#abline(h=partial(object=rf.1, pred.var='VARIETY', which.class=2, train=x, prob=TRUE)[1,2], lty=2)abline(h=sum(x$COMPARISON=='synthetic')/nrow(x), lty=2)points(x=0:3, y=b[,2], pch=16, col=alpha('#004F86', alpha=0.7), cex=3)points(x=0:3, y=b[,1], pch=16, col=alpha('#7BC8FF', alpha=0.7), cex=3)legend('top', legend=c('BrE', 'SLE'), fill=c('#7BC8FF', '#004F86'), ncol=2, xjust=0.5, yjust=0.5)#dev.off()```Plot for `STRESS_LAST_SYLL`:```{r}(pd.corpustrigger <-partial(object=rf.1,pred.var='STRESS_LAST_SYLL',which.class=2, train=x,prob=TRUE))b <-matrix(data=pd.corpustrigger$yhat, nrow=2)rownames(b) <-levels(x$STRESS_LAST_SYLL)#png("03d_pd-stresslastsyll.png", width=22.5, height=15, units="cm", res=300)plot(x=0, ylim=c(0.5,1), xlim=c(0,1), xaxt='n', bty='n', pch='', xlab='Stress on the last syllable of the adjective lemma',ylab=substitute(paste('Probability of ', italic('synthetic'), 'comparison')),main='Partial dep. of COMPARISON on STRESS_LAST_SYLL',cex.main=1.5);grid()axis(1, at=0:1, labels=c('no', 'yes'))#abline(h=partial(object=rf.1, pred.var='VARIETY', which.class=2, train=x, prob=TRUE)[1,2], lty=2)abline(h=sum(x$COMPARISON=='synthetic')/nrow(x), lty=2)points(x=0:1, y=b[,1], pch=16, col=alpha('#004F86', alpha=0.7), cex=3)#dev.off()```Plot for `VARIETYxSTRESS_LAST_SYLL`:```{r}(pd.corpustrigger <-partial(object=rf.1,pred.var='VARIETYxSTRESS_LAST_SYLL',which.class=2, train=x,prob=TRUE))b <-matrix(data=pd.corpustrigger$yhat, nrow=2)rownames(b) <-levels(x$STRESS_LAST_SYLL)colnames(b) <-levels(x$VARIETY)#png("03e_pd-varietyxstresslastsyll.png", widt=18, height=13, units="cm", res=300)plot(x=0, ylim=c(0.5,1), xlim=c(0,1), xaxt='n', bty='n', pch='', xlab='Stress on the last syllable of the adjective lemma',ylab=substitute(paste('Probability of ', italic('synthetic'), ' comparison')),main='Partial dep. of COMPARISON on VARIETYxSTRESS_LAST_SYLL',cex.main=1);grid()axis(1, at=0:1, labels=c('no', 'yes'))#abline(h=partial(object=rf.1, pred.var='VARIETY', which.class=2, train=x, prob=TRUE)[1,2], lty=2)abline(h=sum(x$COMPARISON=='synthetic')/nrow(x), lty=2)points(x=0:1, y=b[,2], pch=16, col=alpha('#004F86', alpha=0.7), cex=3)points(x=0:1, y=b[,1], pch=16, col=alpha('#7BC8FF', alpha=0.7), cex=3)legend('top', legend=c('BrE', 'SLE'), fill=c('#7BC8FF', '#004F86'), ncol=2, xjust=0.5, yjust=0.5)#dev.off()```Plot for `VAREITYxZIPF_FREQ.cat`:```{r}(pd.corpustrigger <-partial(object=rf.1,pred.var='VARIETYxZIPF_FREQ.cat',which.class=2, train=x,prob=TRUE))b <-matrix(data=pd.corpustrigger$yhat, nrow=4)rownames(b) <-levels(x$ZIPF_FREQ.cat)colnames(b) <-levels(x$VARIETY)#png("03f_pd-varietyxfrequency.png", widt=18, height=13, units="cm", res=300)plot(x=0, ylim=c(0.5,1), xlim=c(-0.2,3.2), xaxt='n', bty='n', pch='', xlab='Frequency of the Adjective Lemma',ylab=substitute(paste('Probability of ', italic('synthetic'), ' comparison')),main='Partial dep. of COMPARISON on VARIETYxZIPF_FREQ.cat',cex.main=1);grid()axis(1, at=0:3, labels=c("[0,13.293]", levels(x$ZIPF_FREQ.cat)[2:4]))#abline(h=partial(object=rf.1, pred.var='VARIETY', which.class=2, train=x, prob=TRUE)[1,2], lty=2)abline(h=sum(x$COMPARISON=='synthetic')/nrow(x), lty=2)points(x=c(0.05, 1.05, 2.05, 3.05), y=b[,2], pch=16, col=alpha('#004F86', alpha=0.7), cex=3)points(x=c(-0.05, 0.95, 1.95, 2.95), y=b[,1], pch=16, col=alpha('#7BC8FF', alpha=0.7), cex=3)legend('top', legend=c('BrE', 'SLE'), fill=c('#7BC8FF', '#004F86'), ncol=2, xjust=0.5, yjust=0.5)#dev.off()```Plot for `VARIETYxFINAL_SEGMENT.cat`:```{r}(pd.corpustrigger <-partial(object=rf.1,pred.var='VARIETYxFINAL_SEGMENT.cat',which.class=2, train=x,prob=TRUE))b <-matrix(data=pd.corpustrigger$yhat, nrow=5)rownames(b) <-levels(x$FINAL_SEGMENT.cat)colnames(b) <-levels(x$VARIETY)#png("03g_pd-varietyxfinalsegment.png", widt=18, height=13, units="cm", res=300)plot(x=0, ylim=c(0.5,1), xlim=c(-0.2,4.2), xaxt='n', bty='n', pch='', xlab='Similarity of the Final Segment to the Synthetic Ending',ylab=substitute(paste('Probability of ', italic('synthetic'), ' comparison')),main='Partial dep. of COMPARISON on VARIETYxFINAL_SEGMENT.cat',cex.main=1);grid()axis(1, at=0:4, labels=levels(x$FINAL_SEGMENT.cat))#abline(h=partial(object=rf.1, pred.var='VARIETY', which.class=2, train=x, prob=TRUE)[1,2], lty=2)abline(h=sum(x$COMPARISON=='synthetic')/nrow(x), lty=2)points(x=c(0.05,1.05,2.05,3.05,4.05), y=b[,2], pch=16, col=alpha('#004F86', alpha=0.7), cex=3)points(x=c(-0.05,0.95,1.95,2.95,3.95), y=b[,1], pch=16, col=alpha('#7BC8FF', alpha=0.7), cex=3)legend('top', legend=c('BrE', 'SLE'), fill=c('#7BC8FF', '#004F86'), ncol=2, xjust=0.5, yjust=0.5)#dev.off()``````{r}sessionInfo()```<!-- R version 4.3.0 (2023-04-21 ucrt) --><!-- Platform: x86_64-w64-mingw32/x64 (64-bit) --><!-- Running under: Windows 10 x64 (build 19045) --><!-- Matrix products: default --><!-- locale: --><!-- [1] LC_COLLATE=German_Germany.utf8 LC_CTYPE=German_Germany.utf8 LC_MONETARY=German_Germany.utf8 --><!-- [4] LC_NUMERIC=C LC_TIME=German_Germany.utf8 --><!-- time zone: Europe/Berlin --><!-- tzcode source: internal --><!-- attached base packages: --><!-- [1] grid stats graphics grDevices utils datasets methods base --><!-- other attached packages: --><!-- [1] car_3.1-2 carData_3.0-5 dplyr_1.1.2 pdp_0.8.1 --><!-- [5] caret_6.0-94 lattice_0.21-8 ggplot2_3.4.2 tree_1.0-43 --><!-- [9] partykit_1.2-20 mvtnorm_1.2-2 libcoin_1.0-9 randomForest_4.7-1.1 --><!-- loaded via a namespace (and not attached): --><!-- [1] gtable_0.3.3 xfun_0.39 recipes_1.0.6 vctrs_0.6.3 --><!-- [5] tools_4.3.0 generics_0.1.3 stats4_4.3.0 parallel_4.3.0 --><!-- [9] proxy_0.4-27 tibble_3.2.1 fansi_1.0.4 ModelMetrics_1.2.2.2 --><!-- [13] pkgconfig_2.0.3 Matrix_1.5-4 data.table_1.14.8 lifecycle_1.0.3 --><!-- [17] farver_2.1.1 compiler_4.3.0 stringr_1.5.0 munsell_0.5.0 --><!-- [21] codetools_0.2-19 class_7.3-21 prodlim_2023.03.31 Formula_1.2-5 --><!-- [25] pillar_1.9.0 MASS_7.3-58.4 gower_1.0.1 iterators_1.0.14 --><!-- [29] abind_1.4-5 rpart_4.1.19 foreach_1.5.2 nlme_3.1-162 --><!-- [33] parallelly_1.36.0 lava_1.7.2.1 tidyselect_1.2.0 digest_0.6.31 --><!-- [37] stringi_1.7.12 inum_1.0-5 future_1.33.0 reshape2_1.4.4 --><!-- [41] purrr_1.0.1 listenv_0.9.0 splines_4.3.0 colorspace_2.1-0 --><!-- [45] cli_3.6.1 magrittr_2.0.3 survival_3.5-5 utf8_1.2.3 --><!-- [49] e1071_1.7-13 future.apply_1.11.0 withr_2.5.0 scales_1.2.1 --><!-- [53] lubridate_1.9.2 timechange_0.2.0 globals_0.16.2 nnet_7.3-18 --><!-- [57] timeDate_4022.108 knitr_1.43 hardhat_1.3.0 rlang_1.1.1 --><!-- [61] Rcpp_1.0.10 glue_1.6.2 pROC_1.18.4 ipred_0.9-14 --><!-- [65] rstudioapi_0.14 R6_2.5.1 plyr_1.8.8 -->